Complexity of Word Collocation Networks: A Preliminary Structural Analysis

نویسنده

  • Shibamouli Lahiri
چکیده

In this paper, we explore complex network properties of word collocation networks (Ferret, 2002) from four different genres. Each document of a particular genre was converted into a network of words with word collocations as edges. We analyzed graphically and statistically how the global properties of these networks varied across different genres, and among different network types within the same genre. Our results indicate that the distributions of network properties are visually similar but statistically apart across different genres, and interesting variations emerge when we consider different network types within a single genre. We further investigate how the global properties change as we add more and more collocation edges to the graph of one particular genre, and observe that except for the number of vertices and the size of the largest connected component, network properties change in phases, via jumps and drops.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Disambiguating Noun Compounds

This paper is concerned with the interaction between word sense disambiguation and the interpretation of noun compounds (NCs) in English. We develop techniques for disambiguating word sense specifically in NCs, and then investigate whether word sense information can aid in the semantic relation interpretation of NCs. To disambiguate word sense, we combine the one sense per collocation heuristic...

متن کامل

A Comparison of Co-occurrence and Similarity Measures as Simulations of Context

Observations of word co-occurrences and similarity computations are often used as a straightforward way to represent the global contexts of words and achieve a simulation of semantic word similarity for applications such as word or document clustering and collocation extraction. Despite the simplicity of the underlying model, it is necessary to select a proper significance, a similarity measure...

متن کامل

Word Segmentation for Urdu OCR System

This paper presents a technique for Word segmentation for the Urdu OCR system. Word segmentation or word tokenization is a preliminary task for understanding the meanings of sentences in Urdu language processing. Several techniques are available for word segmentation in other languages but not much work has been done for word segmentation of Urdu Optical Character Recognition (OCR) System. A me...

متن کامل

Multiple solutions of a nonlinear reactive transport model using least square pseudo-spectral collocation method

The recognition and the calculation of all branches of solutions of the nonlinear boundary value problems is difficult obviously. The complexity of this issue goes back to the being nonlinearity of the problem. Regarding this matter, this paper considers steady state reactive transport model which does not have exact closed-form solution and discovers existence of dual or triple solutions in so...

متن کامل

Evolutionary Graph Clustering using Graph and Cluster Mixtures

Many networks and accordingly their representation in graphs are subject to structural changes during the course of their existence [CKT06]. Examples for such evolutionary networks include friendship networks in online communities, co-authorship networks in the scientific domain and collocation networks in computational linguistics. Studying the evolution of such networks can provide vital data...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014